What is An Application Programming Interface (API)?
An Application Programming Interface (API) is a way for two computer programs to speak to each other
In modern software development they are used extensively when:
two programs are not on the same machine
two applications are not in the same language
when the inner workings of a software should be obscured, but its functionality is offered for customization
when a graphic user interface would be inconvenient at scale
Several important types (SOAP, GraphQL, etc.), but we will focus on REST (Representational state transfer) APIs
Commonly used to distribute data or do many other things
A few prominent examples:
the Twitter and Facebook APIs (both effectively defunct)
the ChatGPT API, which is used to build many additional services
news APIs like The Guardian and NYT
financial APIs
translation APIs (Google, Bing and DeepL)
Parts of an API call
API calls usually combine several elements:
a base URL of the service (e.g., https://api.openai.com/)
an endpoint for a specific service, usually accessed through a sub-directory (e.g., /v1/completions)
an API method: GET, POST, PUT, DELETE, etc. (only GET and sometimes POST are important for us )
headers containing some settings, e.g., what format you want to receive the data in (JSON, XML, HTML etc.), and communicating who you are through user-agent, cookies, device and software information that is usually used for debugging
query parameters, i.e., your search term, filters, what fields/columns you want to access, how many results you want to receive, how results are ordered etc (?q=parliament%20AND%20debate)
a body if your request contains some more complicated instructions (not for GET requests)
authentication, usually in form of a token (a standardized string, similar to a password)
Parts of an API response
APIs respond to a call. The response usually also contains several elements:
a status code: 200s mean success, 300s mean success with some caveat, 400+ are request errors (not found, forbidden), 500 is a server error
headers provide additional information about the response (e.g., type of data returned, size of the data, time stamp)
body: the main response containing the requested data
response metadata: more information about the response (e.g., pagination information, version numbers, remaining rate limit allowance, link to next page)
error messages: when unsuccessful, the API might include an error message on top of the status code
Accessing APIs from R
The httr2 package
rewrite of the httr which was the de-factor default to develop API packages in R
developed by Hadley Wickham
tidyverse programming principles
telling verbs are used in a pipe
requests are build up using req_* functions
responses are deconstructed using resp_*
makes wrapping an API in a few functions or a package straightforward
Example: The Guardian API
Background
The newspaper The Guardian offers all its articles through an open API for free 🤓
To access the API, you first need to obtain an API key by filling out a small form here
The API key should arrive within seconds per mail
This is unfortunately very rare in the world of news media ☹️
To figure out how to use the API, we can use its documentation
Your task: get a key and use usethis::edit_r_environ(scope = "project") to open your .Renviron file. Save the API key as the variable GUARDIAN_KEY.
Building Requests
Let’s build our first httr2 request!
library(httr2)library(tidyverse, warn.conflicts =FALSE)req <-request("https://content.guardianapis.com") |># start the request with the base URLreq_url_path("search") |># navigate to the endpoint you want to accessreq_method("GET") |># specify the methodreq_timeout(seconds =60) |># how long to wait for a responsereq_headers("User-Agent"="httr2 guardian test") |># specify request headers# req_body_json() |> # since this is a GET request the body stays emptyreq_url_query( # instead the query is added to the URLq ="parliament AND debate","show-blocks"="all" ) |>req_url_query( # in this case, the API key is also added to the query"api-key"=Sys.getenv("GUARDIAN_KEY") # but httr2 also has req_auth_* functions for other ) # authentication proceduresprint(req)
We now built the request. But this doesn’t yet do anything until you also perform it.
Performing the request
resp <-req_perform(req)resp
Printing the request tells us several important things:
the status of the response is OK (hurray!)
the response carries data in the JSON format
however, you probably don’t want to manually inspect each response…
Parsing the response: a first look
We can automatically check if the response has the form we expect:
resp_status(resp) <400
[1] TRUE
resp_content_type(resp) =="application/json"
[1] TRUE
If we’re happy with the status of the response, we can start to look at the body by transforming it with the correct resp_body_* function:
We already see some useful information about the the result. We could extract that information either with pluck from the tidyverse or using square brackets:
pluck(returned_body, "response", "total")
[1] 32109
pluck(returned_body, "response", "pageSize")
[1] 10
pluck(returned_body, "response", "pages")
[1] 3211
returned_body[["response"]][["total"]]
[1] 32109
returned_body[["response"]][["pageSize"]]
[1] 10
returned_body[["response"]][["pages"]]
[1] 3211
Parsing the response: extracting the data
So far we only got the results for page 1, which is a common way to return results from an API. To get to the other pages that contain results, we would need to loop through all of these pages (by adding the query page = i). For now, we can have a closer look at the articles on the first results page.
We can have a closer look at this using the Viewer in RStudio:
View(search_res)
In typical fashion, this API returns the data in a rather complicated format. This is probably the main reason why people dislike working with APIs in R, as it can be very frustrating to get this into a format that makes sense for us.
Parsing the response: building a data wrangling function
Let’s build a function to select just some important information. We start by writing a few lines of code to parse the first article:
res <-pluck(search_res, 1)res
$id
[1] "world/2024/dec/10/queensland-parliament-passes-unprecedented-gag-on-abortion-debate"
$type
[1] "article"
$sectionId
[1] "world"
$sectionName
[1] "World news"
$webPublicationDate
[1] "2024-12-10T03:48:32Z"
$webTitle
[1] "Queensland parliament passes ‘unprecedented’ gag on abortion debate"
$webUrl
[1] "https://www.theguardian.com/world/2024/dec/10/queensland-parliament-passes-unprecedented-gag-on-abortion-debate"
$apiUrl
[1] "https://content.guardianapis.com/world/2024/dec/10/queensland-parliament-passes-unprecedented-gag-on-abortion-debate"
$blocks
$blocks$main
$blocks$main$id
[1] "6757da258f08cefeb937f379"
$blocks$main$bodyHtml
[1] "<figure class=\"element element-atom\"> <gu-atom data-atom-id=\"be524d7d-2d98-491e-86ff-653848fc4329\" data-atom-type=\"media\" > </gu-atom> </figure>"
$blocks$main$bodyTextSummary
[1] ""
$blocks$main$attributes
named list()
$blocks$main$published
[1] TRUE
$blocks$main$createdDate
[1] "2024-12-10T06:05:25Z"
$blocks$main$firstPublishedDate
[1] "2024-12-10T03:59:44Z"
$blocks$main$publishedDate
[1] "2024-12-10T06:05:31Z"
$blocks$main$lastModifiedDate
[1] "2024-12-10T06:05:30Z"
$blocks$main$contributors
list()
$blocks$main$elements
$blocks$main$elements[[1]]
$blocks$main$elements[[1]]$type
[1] "contentatom"
$blocks$main$elements[[1]]$assets
list()
$blocks$main$elements[[1]]$contentAtomTypeData
$blocks$main$elements[[1]]$contentAtomTypeData$atomId
[1] "be524d7d-2d98-491e-86ff-653848fc4329"
$blocks$main$elements[[1]]$contentAtomTypeData$atomType
[1] "media"
$blocks$body
$blocks$body[[1]]
$blocks$body[[1]]$id
[1] "6757a1a68f0805d974b99384"
$blocks$body[[1]]$bodyHtml
[1] "<p>The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”.</p> <p>The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order.</p> <p>Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”.</p> <p>There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s breaking news email</a></strong></p></li> </ul> <p>The issue <a href=\"https://www.theguardian.com/australia-news/2024/oct/13/queensland-election-2024-lnp-abortion-policy-david-crisafulli\">dominated the recent Queensland election campaign</a>, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time.</p> <p>Crisafulli said his motion ended a “disgraceful … US-style scare campaign”.</p> <p>“I said from day one, it was not part of our plan,” Crisafulli said.</p> <p>“I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.”</p> <p>Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion.</p> <p>The opposition leader, Steven Miles, said the move was “extraordinary”.</p> <p>“Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary <a href=\"https://www.theguardian.com/australia-news/2024/nov/28/queensland-indigenous-truth-telling-inquiry-head-joshua-creamer\">as what they did at the last sitting</a>. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].”</p> <p>Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb\">How anger at Australia’s rollout of renewables is being hijacked by a new pro-nuclear network</a> </p> </aside> <p>“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible.</p> <p>“This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.”</p> <p>Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue.</p> <p>It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death.</p> <p>At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”.</p> <p>“I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said.</p> <p>UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament.</p> <p>“But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said.</p> <p>“It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.”</p> <p>Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity.</p> <p>All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it.</p>"
$blocks$body[[1]]$bodyTextSummary
[1] "The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”. The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order. Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”. There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through. Sign up for Guardian Australia’s breaking news email The issue dominated the recent Queensland election campaign, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time. Crisafulli said his motion ended a “disgraceful … US-style scare campaign”. “I said from day one, it was not part of our plan,” Crisafulli said. “I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.” Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion. The opposition leader, Steven Miles, said the move was “extraordinary”. “Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary as what they did at the last sitting. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].” Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.\n“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible. “This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.” Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue. It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death. At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”. “I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said. UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament. “But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said. “It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.” Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity. All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it."
$blocks$body[[1]]$attributes
named list()
$blocks$body[[1]]$published
[1] TRUE
$blocks$body[[1]]$createdDate
[1] "2024-12-10T03:48:32Z"
$blocks$body[[1]]$firstPublishedDate
[1] "2024-12-10T04:00:02Z"
$blocks$body[[1]]$publishedDate
[1] "2024-12-10T05:05:25Z"
$blocks$body[[1]]$lastModifiedDate
[1] "2024-12-10T05:05:24Z"
$blocks$body[[1]]$contributors
list()
$blocks$body[[1]]$elements
$blocks$body[[1]]$elements[[1]]
$blocks$body[[1]]$elements[[1]]$type
[1] "text"
$blocks$body[[1]]$elements[[1]]$assets
list()
$blocks$body[[1]]$elements[[1]]$textTypeData
$blocks$body[[1]]$elements[[1]]$textTypeData$html
[1] "<p>The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”.</p> \n<p>The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order.</p> \n<p>Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”.</p> \n<p>There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s breaking news email</a></strong></p></li> \n</ul> \n<p>The issue <a href=\"https://www.theguardian.com/australia-news/2024/oct/13/queensland-election-2024-lnp-abortion-policy-david-crisafulli\">dominated the recent Queensland election campaign</a>, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time.</p> \n<p>Crisafulli said his motion ended a “disgraceful … US-style scare campaign”.</p> \n<p>“I said from day one, it was not part of our plan,” Crisafulli said.</p> \n<p>“I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.”</p> \n<p>Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion.</p> \n<p>The opposition leader, Steven Miles, said the move was “extraordinary”.</p> \n<p>“Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary <a href=\"https://www.theguardian.com/australia-news/2024/nov/28/queensland-indigenous-truth-telling-inquiry-head-joshua-creamer\">as what they did at the last sitting</a>. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].”</p> \n<p>Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.</p>"
$blocks$body[[1]]$elements[[2]]
$blocks$body[[1]]$elements[[2]]$type
[1] "rich-link"
$blocks$body[[1]]$elements[[2]]$assets
list()
$blocks$body[[1]]$elements[[2]]$richLinkTypeData
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb"
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb"
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "How anger at Australia’s rollout of renewables is being hijacked by a new pro-nuclear network"
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"
$blocks$body[[1]]$elements[[3]]
$blocks$body[[1]]$elements[[3]]$type
[1] "text"
$blocks$body[[1]]$elements[[3]]$assets
list()
$blocks$body[[1]]$elements[[3]]$textTypeData
$blocks$body[[1]]$elements[[3]]$textTypeData$html
[1] "<p>“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible.</p> \n<p>“This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.”</p> \n<p>Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue.</p> \n<p>It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death.</p> \n<p>At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”.</p> \n<p>“I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said.</p> \n<p>UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament.</p> \n<p>“But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said.</p> \n<p>“It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.”</p> \n<p>Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity.</p> \n<p>All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it.</p>"
$blocks$totalBodyBlocks
[1] 1
$isHosted
[1] FALSE
$pillarId
[1] "pillar/news"
$pillarName
[1] "News"
time <- lubridate::ymd_hms(res$webPublicationDate)time
[1] "2024-12-10 03:48:32 UTC"
headline <- res$webTitleheadline
[1] "Queensland parliament passes ‘unprecedented’ gag on abortion debate"
Parsing the response: building a data wrangling function
So far so good, but where is the text? It seems it is stored in these “blocks” -> “body” elements. Let’s have a look:
pluck(res, "blocks", "body")
[[1]]
[[1]]$id
[1] "6757a1a68f0805d974b99384"
[[1]]$bodyHtml
[1] "<p>The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”.</p> <p>The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order.</p> <p>Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”.</p> <p>There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s breaking news email</a></strong></p></li> </ul> <p>The issue <a href=\"https://www.theguardian.com/australia-news/2024/oct/13/queensland-election-2024-lnp-abortion-policy-david-crisafulli\">dominated the recent Queensland election campaign</a>, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time.</p> <p>Crisafulli said his motion ended a “disgraceful … US-style scare campaign”.</p> <p>“I said from day one, it was not part of our plan,” Crisafulli said.</p> <p>“I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.”</p> <p>Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion.</p> <p>The opposition leader, Steven Miles, said the move was “extraordinary”.</p> <p>“Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary <a href=\"https://www.theguardian.com/australia-news/2024/nov/28/queensland-indigenous-truth-telling-inquiry-head-joshua-creamer\">as what they did at the last sitting</a>. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].”</p> <p>Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb\">How anger at Australia’s rollout of renewables is being hijacked by a new pro-nuclear network</a> </p> </aside> <p>“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible.</p> <p>“This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.”</p> <p>Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue.</p> <p>It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death.</p> <p>At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”.</p> <p>“I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said.</p> <p>UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament.</p> <p>“But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said.</p> <p>“It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.”</p> <p>Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity.</p> <p>All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it.</p>"
[[1]]$bodyTextSummary
[1] "The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”. The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order. Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”. There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through. Sign up for Guardian Australia’s breaking news email The issue dominated the recent Queensland election campaign, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time. Crisafulli said his motion ended a “disgraceful … US-style scare campaign”. “I said from day one, it was not part of our plan,” Crisafulli said. “I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.” Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion. The opposition leader, Steven Miles, said the move was “extraordinary”. “Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary as what they did at the last sitting. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].” Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.\n“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible. “This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.” Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue. It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death. At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”. “I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said. UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament. “But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said. “It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.” Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity. All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it."
[[1]]$attributes
named list()
[[1]]$published
[1] TRUE
[[1]]$createdDate
[1] "2024-12-10T03:48:32Z"
[[1]]$firstPublishedDate
[1] "2024-12-10T04:00:02Z"
[[1]]$publishedDate
[1] "2024-12-10T05:05:25Z"
[[1]]$lastModifiedDate
[1] "2024-12-10T05:05:24Z"
[[1]]$contributors
list()
[[1]]$elements
[[1]]$elements[[1]]
[[1]]$elements[[1]]$type
[1] "text"
[[1]]$elements[[1]]$assets
list()
[[1]]$elements[[1]]$textTypeData
[[1]]$elements[[1]]$textTypeData$html
[1] "<p>The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”.</p> \n<p>The motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order.</p> \n<p>Opposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”.</p> \n<p>There was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/email-newsletters?CMP=copyembed\">Sign up for Guardian Australia’s breaking news email</a></strong></p></li> \n</ul> \n<p>The issue <a href=\"https://www.theguardian.com/australia-news/2024/oct/13/queensland-election-2024-lnp-abortion-policy-david-crisafulli\">dominated the recent Queensland election campaign</a>, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time.</p> \n<p>Crisafulli said his motion ended a “disgraceful … US-style scare campaign”.</p> \n<p>“I said from day one, it was not part of our plan,” Crisafulli said.</p> \n<p>“I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.”</p> \n<p>Crisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion.</p> \n<p>The opposition leader, Steven Miles, said the move was “extraordinary”.</p> \n<p>“Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary <a href=\"https://www.theguardian.com/australia-news/2024/nov/28/queensland-indigenous-truth-telling-inquiry-head-joshua-creamer\">as what they did at the last sitting</a>. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].”</p> \n<p>Several Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.</p>"
[[1]]$elements[[2]]
[[1]]$elements[[2]]$type
[1] "rich-link"
[[1]]$elements[[2]]$assets
list()
[[1]]$elements[[2]]$richLinkTypeData
[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb"
[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/environment/2024/dec/09/anti-renewable-energy-campaigns-liberal-coalition-labor-ntwnfb"
[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "How anger at Australia’s rollout of renewables is being hijacked by a new pro-nuclear network"
[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "
[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"
[[1]]$elements[[3]]
[[1]]$elements[[3]]$type
[1] "text"
[[1]]$elements[[3]]$assets
list()
[[1]]$elements[[3]]$textTypeData
[[1]]$elements[[3]]$textTypeData$html
[1] "<p>“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible.</p> \n<p>“This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.”</p> \n<p>Crossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue.</p> \n<p>It has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death.</p> \n<p>At a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”.</p> \n<p>“I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said.</p> \n<p>UQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament.</p> \n<p>“But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said.</p> \n<p>“It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.”</p> \n<p>Orr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity.</p> \n<p>All 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it.</p>"
Parsing the response: building a data wrangling function
It seems the API returns articles as HTML strings. Luckily, we know how to extract text from that 😎
[1] "The Queensland parliament has been banned from debate on abortion for four years after an ambush motion by the premier, David Crisafulli, in a move labelled “unprecedented”.\n\nThe motion also requires any “motion or amendment” seeking to have the house “express its views” on abortion be ruled out of order.\n\nOpposition and crossbench MPs labelled the move “extraordinary” and “unprecedented”.\n\nThere was no notice of the move before Crisafulli introduced the motion into the parliament after question time. Just half an hour was set aside for debate before he used the Liberal National party’s majority to ram it through.\n\nSign up for Guardian Australia’s breaking news email\n\nThe issue dominated the recent Queensland election campaign, with Labor repeatedly warning the LNP would roll back their historic 2018 legalisation legalising the procedure for the first time.\n\nCrisafulli said his motion ended a “disgraceful … US-style scare campaign”.\n\n“I said from day one, it was not part of our plan,” Crisafulli said.\n\n“I said there will be no changes. And despite that – Labor knew this – and despite that, the social media tsunami, the grubby phone calls continued unabated. They spent millions of dollars on a disgraceful scare campaign.”\n\nCrisafulli failed to rule out a conscience vote on the issue, despite being asked dozens of times to do so. Many of his MPs have confirmed they are anti-abortion.\n\nThe opposition leader, Steven Miles, said the move was “extraordinary”.\n\n“Mr Speaker, these are extraordinary scenes. I thought I’d never seen anything as extraordinary as what they did at the last sitting. But this, with no notice, no discussion, no advice to the media that it was even coming, such a grubby, grubby treatment [of a serious issue].”\n\nSeveral Labor MPs pointed out that the motion prevented the parliament from expanding legislative protections for abortion services.\n\nRelated: How anger at Australia’s rollout of renewables is being hijacked by a new pro-nuclear network\n\n“If there is further developments by the TGA to further strengthen scope of practice for health professionals, to make it easier for women, this house could not debate that issue,” Labor MP Shannon Fentiman said. “If women need more protection from attending abortion clinics, those reforms would not be possible.\n\n“This is appalling, it is unprecedented and they should be ashamed for being so obviously, so obviously against women’s rights in this state that they have to gag their own members for four years. Shame.”\n\nCrossbench MP Robbie Katter has repeatedly vowed to reintroduce the party’s “babies born alive” bill, which regulated abortion providers, a move that would almost certainly have forced a vote on the issue.\n\nIt has been a longstanding practice of the LNP party room to grant MPs a conscience vote – allowing them to vote freely without influence of the party – on matters of life or death.\n\nAt a press conference on Tuesday afternoon, KAP MPs warned that it was the “death of democracy”.\n\n“I think Queenslanders today should be mourning the death of democracy here in the Queensland parliament,” KAP MP Nick Dametto said.\n\nUQ law professor Graeme Orr said the motion lawfully amended parliament’s sessional orders for the term of the parliament.\n\n“But this is quite odd. Only two weeks back, the government asked parliament to adopt the sessional orders for this term,” he said.\n\n“It tells Queensland the intention is to brook no debate about changing the Termination of Pregnancy Act.”\n\nOrr said the law was “a bit silly” because it would have to be undone if there was a genuine need to amend the act due to unforseen necessity.\n\nAll 50 LNP MPs present voted for the motion, with 35 Labor MPs and three Katter’s Australia party MPs voting against it."
Parsing the response: finising the data wrangling function
# A tibble: 1 × 5
id type time headline text
<chr> <chr> <dttm> <chr> <chr>
1 world/2024/dec/10/queensland-parliam… arti… 2024-12-10 03:48:32 Queensl… "The…
We can loop over all articles returned by the API and apply this function to it:
map(search_res, parse_response) |>bind_rows() # combine the list into one data.frame
We can copy the output from curl_translate() and run it in R. I also added the resp_body_json() since we already know the returned data will be json.
search <-request("https://members-api.parliament.uk/api/Members/Search") |>req_method("GET") |>req_url_query(Name ="Major",skip ="0",take ="20", ) |>req_headers(accept ="text/plain", ) |>req_perform()|>resp_body_json() # <- I added this line to process the response
As usual, we get some meta information like totalResults and the data in a list. To make the items more useful, we can bring them into a tabular format.
# A tibble: 1 × 9
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty
<int> <chr> <chr> <chr> <chr> <chr> <list>
1 119 Major, Mr… Mr John Major Rt Hon John … Mr Major M <named list>
# ℹ 2 more variables: latestHouseMembership <list>, test <chr>
This code is relatively busy, so let’s deconstruct it a little:
tibble wraps the results in a tibble
items is a list, to extract the first element from it, we used pluck(search, "items", 1), but usually we have more than 1 result, so we need to loop over the results using a map_* function
We know what types to expect from our first request, so we choose map_int for integer fields, map_chr for character fields and map for lists
Wrangling the data
As usual, we get some meta information like totalResults and the data in a list. To make the items more useful, we can bring them into a tabular format.
# A tibble: 1 × 9
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty
<int> <chr> <chr> <chr> <chr> <chr> <list>
1 119 Major, Mr… Mr John Major Rt Hon John … Mr Major M <named list>
# ℹ 2 more variables: latestHouseMembership <list>, test <chr>
This code is relatively busy, so let’s deconstruct it a little:
I included the test column simply to show why we use pluck here instead of e.g., i[["value"]][["id"]]: we can set a default value if nothing is found
many APIs are inconsistent in what they return
if you try to extract a field deep in a list with [[]], you will get an error that the field does not exist or NULL (which causes an error with tibble())
returning NA instead makes the parsing safer and is good practice
Wrapping the endpoint in a function
The reason why APIs are useful is because you can request all kinds of information using a few parameters. This lends itself very well to wrapping specific calls in functions.
# A tibble: 4 × 8
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty
<int> <chr> <chr> <chr> <chr> <chr> <list>
1 512 Blair, Mr… Mr Tony Blair Rt Hon Tony … Mr Blair M <named list>
2 4182 Blair of … Lord Blair o… The Lord Bla… The Lord Bla… M <named list>
3 4377 Donaldson… Stuart Blair… Stuart Blair… Stuart Blair… M <named list>
4 5076 McDougall… Blair McDoug… Blair McDoug… <NA> M <named list>
# ℹ 1 more variable: latestHouseMembership <list>
search_members("Smith")
# A tibble: 20 × 8
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender
<int> <chr> <chr> <chr> <chr> <chr>
1 5368 Booth-Smith, L. Lord Booth-S… The Lord Boo… <NA> M
2 727 Buchanan-Smith, Alick Alick Buchan… Rt Hon Alick… <NA> M
3 4756 Clarke-Smith, Brendan Brendan Clar… Brendan Clar… <NA> M
4 2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA> F
5 2713 Dixon-Smith, L. Lord Dixon-S… The Lord Dix… The Lord Dix… M
6 152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M
7 2490 Goldsmith, L. Lord Goldsmi… The Rt Hon. … <NA> M
8 4062 Goldsmith of Richmond… Lord Goldsmi… The Rt Hon. … <NA> M
9 29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA> M
10 5341 Kyrke-Smith, Laura Laura Kyrke-… Laura Kyrke-… <NA> F
11 4554 McGregor-Smith, B. Baroness McG… The Baroness… <NA> F
12 5273 Naismith, Connor Connor Naism… Connor Naism… <NA> M
13 216 Naysmith, Dr Doug Dr Doug Nays… Dr Doug Nays… Dr Naysmith M
14 4738 Smith, Alyn Alyn Smith Alyn Smith <NA> M
15 95 Smith, Mr Andrew Mr Andrew Sm… Rt Hon Andre… Mr Smith M
16 1564 Smith, Angela Angela Smith Angela Smith Angela Smith F
17 30 Smith, Angela E. Angela E. Sm… Rt Hon Angel… <NA> F
18 4436 Smith, Cat Cat Smith Cat Smith MP Cat Smith F
19 1609 Smith, Chloe Chloe Smith Rt Hon Chloe… Chloe Smith F
20 1292 Smith, Sir Cyril Sir Cyril Sm… Sir Cyril Sm… <NA> M
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>
The Smith search is a little odd since there are surely more than 20 results for this common name.
Wrapping the endpoint in a function: add pagination
Most APIs use pagination when the data matching a query becomes too big
In that case you need to iterate through the pages to get everything
The UK parliament APIs handles pagination through two parameters:
skip: The number of records to skip from the first, default is 0
take: The number of records to return, default is 20. Maximum is 20
So to get the second page with the next 20 results, we need to adapt the call:
# A tibble: 20 × 8
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender
<int> <chr> <chr> <chr> <chr> <chr>
1 5218 Smith, David David Smith David Smith MP <NA> M
2 1267 Smith, Sir Dudley Sir Dudley Smith Sir Dudley Smi… <NA> M
3 4609 Smith, Eleanor Eleanor Smith Eleanor Smith Eleanor Smith F
4 471 Smith, Geraldine Geraldine Smith Geraldine Smith <NA> F
5 4778 Smith, Greg Greg Smith Greg Smith MP <NA> M
6 3960 Smith, Henry Henry Smith Henry Smith Henry Smith M
7 4456 Smith, Jeff Jeff Smith Jeff Smith MP Jeff Smith M
8 681 Smith, John John Smith John Smith John Smith M
9 564 Smith, Mr John Mr John Smith Rt Hon John Sm… <NA> M
10 4118 Smith, Sir Julian Sir Julian Smith Rt Hon Sir Jul… <NA> M
11 2852 Smith, L. The Lord Smith The Rt Hon. th… <NA> M
12 4648 Smith, Laura Laura Smith Laura Smith Laura Smith F
13 541 Smith, Llew Llew Smith Llew Smith <NA> M
14 3928 Smith, Nick Nick Smith Nick Smith MP Nick Smith M
15 4042 Smith, Owen Owen Smith Owen Smith Owen Smith M
16 5301 Smith, Rebecca Rebecca Smith Rebecca Smith … <NA> F
17 639 Smith, Sir Robert Sir Robert Smith Sir Robert Smi… Sir Robert M
18 4478 Smith, Royston Royston Smith Royston Smith Royston Smith M
19 5117 Smith, Sarah Sarah Smith Sarah Smith MP <NA> F
20 1245 Smith, Timothy Timothy Smith Timothy Smith <NA> M
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>
Wrapping the endpoint in a function: add pagination
Based on this we can adapt the function
search_members <-function(name) {# make the initial request resp <-request("https://members-api.parliament.uk/api/Members/Search") |>req_method("GET") |>req_url_query(Name = name,take =20,skip =0 ) |>req_headers(accept ="text/plain", ) |>req_perform() |>resp_body_json()# checking the total and setting things up for pagination total <- resp$totalResultsmessage(total, " results found") skip <-0 page <-1# extract initial results items <-pluck(resp, "items")# while loops are repeated until the condition inside is FALSEwhile (total > skip) { skip <- skip +20 page <- page +1# we print a little status message to let the user know work is ongoingmessage("\t...fetching page ", page)# we retrieve the next page by adding an increasing skip resp <-request("https://members-api.parliament.uk/api/Members/Search") |>req_method("GET") |>req_url_query(Name = name,skip = skip,take =20 ) |>req_headers(accept ="text/plain", ) |>req_throttle(rate =1) |># do not make more than one request per secondreq_perform() |>resp_body_json()# we append the original result with the new items items <-c(items, pluck(resp, "items")) }# wranglereturn(tibble(id =map_int(items, function(i) safe_pluck(i, "value", "id")),nameListAs =map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),nameDisplayAs =map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),nameFullTitle =map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),nameAddressAs =map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),gender =map_chr(items, function(i) safe_pluck(i, "value", "gender")),latestParty =map(items, function(i) safe_pluck(i, "value", "latestParty")),latestHouseMembership =map(items, function(i) safe_pluck(i, "value", "latestHouseMembership")) ))}
search_members("Smith")
# A tibble: 52 × 8
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender
<int> <chr> <chr> <chr> <chr> <chr>
1 5368 Booth-Smith, L. Lord Booth-S… The Lord Boo… <NA> M
2 727 Buchanan-Smith, Alick Alick Buchan… Rt Hon Alick… <NA> M
3 4756 Clarke-Smith, Brendan Brendan Clar… Brendan Clar… <NA> M
4 2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA> F
5 2713 Dixon-Smith, L. Lord Dixon-S… The Lord Dix… The Lord Dix… M
6 152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M
7 2490 Goldsmith, L. Lord Goldsmi… The Rt Hon. … <NA> M
8 4062 Goldsmith of Richmond… Lord Goldsmi… The Rt Hon. … <NA> M
9 29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA> M
10 5341 Kyrke-Smith, Laura Laura Kyrke-… Laura Kyrke-… <NA> F
# ℹ 42 more rows
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>
Adding more parameters
The documentation lists a whole lot of other parameters.
We can copy them into the function to employ them when calling the API.
We can set the defaults to NULL, which means they are ignored by req_url_query when not used
Documentations usually list the required parameters, for which you shouldn’t set a default
search_members <-function(name =NULL,location =NULL,posttitle =NULL,partyid =NULL,house =NULL,constituencyid =NULL,namestartswith =NULL,gender =NULL,membershipstartedsince =NULL,membershipended_membershipendedsince =NULL,membershipended_membershipendreasonids =NULL,membershipindaterange_wasmemberonorafter =NULL,membershipindaterange_wasmemberonorbefore =NULL,membershipindaterange_wasmemberofhouse =NULL,iseligible =NULL,iscurrentmember =NULL,policyinterestid =NULL,experience =NULL) {# 1. request resp <-request("https://members-api.parliament.uk/api/Members/Search") |>req_method("GET") |>req_url_query(Name = name,Location = location,PostTitle = posttitle,PartyId = partyid,House = house,ConstituencyId = constituencyid,NameStartsWith = namestartswith,Gender = gender,MembershipStartedSince = membershipstartedsince,MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,IsEligible = iseligible,IsCurrentMember = iscurrentmember,PolicyInterestId = policyinterestid,Experience = experience,take =20 ) |>req_headers(accept ="text/plain", ) |>req_perform() |># 2. Parseresp_body_json()# checking the total and setting things up for pagination total <- resp$totalResultsmessage(total, " results found") skip <-20 page <-1# extract initial results items <-pluck(resp, "items")# while loops are repeated until the condition inside is FALSEwhile (total > skip) { page <- page +1# we print a little status message to let the user know work is ongoingmessage("\t...fetching page ", page)# we retrieve the next page by adding an increasing skip resp <-request("https://members-api.parliament.uk/api/Members/Search") |>req_method("GET") |>req_url_query(Name = name,Location = location,PostTitle = posttitle,PartyId = partyid,House = house,ConstituencyId = constituencyid,NameStartsWith = namestartswith,Gender = gender,MembershipStartedSince = membershipstartedsince,MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,IsEligible = iseligible,IsCurrentMember = iscurrentmember,PolicyInterestId = policyinterestid,Experience = experience,take =20,skip = skip ) |>req_headers(accept ="text/plain", ) |>req_perform() |># 2. Parseresp_body_json()# we append the original result with the new items items <-c(items, pluck(resp, "items"))# increase the skip number skip <- skip +20 }# wranglereturn(tibble(id =map_int(items, function(i) safe_pluck(i, "value", "id")),nameListAs =map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),nameDisplayAs =map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),nameFullTitle =map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),nameAddressAs =map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),gender =map_chr(items, function(i) safe_pluck(i, "value", "gender")),latestParty =map(items, function(i) safe_pluck(i, "value", "latestParty")),latestHouseMembership =map(items, function(i) safe_pluck(i, "value", "latestHouseMembership")) ))}
search_members("Smith", partyid =4, house =1, gender ="M", iscurrentmember =TRUE)
# A tibble: 3 × 8
id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty
<int> <chr> <chr> <chr> <chr> <chr> <list>
1 152 Duncan Sm… Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M <named list>
2 4778 Smith, Gr… Greg Smith Greg Smith MP <NA> M <named list>
3 4118 Smith, Si… Sir Julian S… Rt Hon Sir J… <NA> M <named list>
# ℹ 1 more variable: latestHouseMembership <list>
Adding documentation
In its current form, the function is working well, but to find out what the parameters do, you would have to visit the documentation website, which isn’t great. To make this more useful, we should add some documentation. In R, the roxygen2 package handles parsing documentation for packages We can use it here to add explanations to the parameters. You can easily add roxygen code to a function using the Code menu in RStudio and Insert Roxygen Skeleton:
#' Search for members of the UK Parliament#'#' @param name Members where name contains term specified#' @param location Members where postcode or geographical location matches the term specified#' @param posttitle Members which have held the post specified#' @param partyid Members which are currently affiliated with party with party ID#' @param house Members where their most recent house is the house specified (1 for Commons, 2 for Lords)#' @param constituencyid Members which currently hold the constituency with constituency id#' @param namestartswith Members with surname beginning with letter(s) specified#' @param gender Members with the gender specified#' @param membershipstartedsince Members who started on or after the date given#' @param membershipended_membershipendedsince Members who left the House on or after the date given#' @param membershipended_membershipendreasonids #' @param membershipindaterange_wasmemberonorafter Members who were active on or after the date specified#' @param membershipindaterange_wasmemberonorbefore Members who were active on or before the date specified#' @param membershipindaterange_wasmemberofhouse Members who were active in the house specified (1 for Commons, 2 for Lords)#' @param iseligible Members currently Eligible to sit in their House#' @param iscurrentmember TRUE gives you members who are current#' @param policyinterestid Members with specified policy interest#' @param experience Members with specified experience#'#' @return#' @export#' #'#' @examplessearch_members <-function(name =NULL,location =NULL,posttitle =NULL,partyid =NULL,house =NULL,constituencyid =NULL,namestartswith =NULL,gender =NULL,membershipstartedsince =NULL,membershipended_membershipendedsince =NULL,membershipended_membershipendreasonids =NULL,membershipindaterange_wasmemberonorafter =NULL,membershipindaterange_wasmemberonorbefore =NULL,membershipindaterange_wasmemberofhouse =NULL,iseligible =NULL,iscurrentmember =NULL,policyinterestid =NULL,experience =NULL) {# ...}
Exercises 2
First, review the material and make sure you have a broad understanding how to:
read the documentation of the UK Parliament API (the documentation is specific to the API, but the Swagger format they use is very common)
how to translate a curl call
What the individual parts of the search_members function are doing
To get more information about an MP, we can use the endpoint “/api/Members/{id}/Biography”
Search for an MP you are interested in with the function above and use the id on the documentation website with “Try it out”
Copy the Curl call and translate it into httr2 code
Wrangle the governmentPosts returned in the data into a tabular format
Bonus:
Write a function which lets you request information given an ID and which wrangles the results
Two more interesting endpoints are “/api/Posts/GovernmentPosts” and “/api/Posts/OppositionPosts”. What do they do and how can you request data from them
Example: Semantic Scholar
What do we want
Get information about scholars and the papers they write
It is shown in the other common documentation format called ReDoc
I personally prefer swagger, however, this format can be produced by the OpenAPI specification linked on the website (you can use ReDoc though if you want)
There is a tool in R which opens a small server on your computer that can display OpenAPI specifications in the swagger format (url for semnatic scholar: https://api.semanticscholar.org/graph/v1/swagger.json)
library(swagger)browseURL(swagger_index())
Making a first request
We can use one of the examples and convert it into httr2:
res <-request("https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith") |>req_perform() |>resp_body_json()
View(res)
Parsing the initial request
We note two meta information that are helpful later on:
pluck(res, "total")
[1] 662
pluck(res, "next")
[1] 100
The actual data sits in data and is a pretty well behaved list that we can just convert to a tibble:
# A tibble: 100 × 2
authorId name
<chr> <chr>
1 39765778 Adam D. Smith
2 2109352620 Adam M. Smith
3 2109352729 Adam B. Smith
4 2276184838 Adam B. Smith
5 39872837 Adam Smith
6 2109352685 Adam C. Smith
7 2128824945 Adam N. H. Smith
8 2170968519 Adam W. Smith
9 2109352675 Adam C. Smith
10 2158602926 Adam M. Smith
# ℹ 90 more rows
However, the information seems a bit sparse… But we’ll look at that later.
Wrapping the endpoint in a function and add pagination
First we wrap this in a function and add pagination to get all results:
find_scholar <-function(name,verbose =TRUE) {# make initial request res <-request("https://api.semanticscholar.org/graph/v1/author/search") |>req_url_query(query = name) |>req_perform() |>resp_body_json()# note total total <-pluck(res, "total")# display user messageif (verbose) {message("Found ", total, " authors") }# note offset nxt <-pluck(res, "next")# wrangle initial data data <-pluck(res, "data") |>bind_rows() page <-1#----- New Stuff -----## loop through pages until no new ones existwhile (!is.null(nxt)) { # if there are not more results next is empty page <- page +1message("\t...fetching page ", page) res <-request("https://api.semanticscholar.org/graph/v1/author/search") |>req_url_query(query = name,offset = nxt) |>req_throttle(rate =30/60) |># make only 30 requests per minutereq_perform() |>resp_body_json()# get next offset; will be NULL on the last page nxt <-pluck(res, "next") data_new <-pluck(res, "data") |>bind_rows() data <- data |>bind_rows(data_new) }return(data)}
find_scholar("Adam Smith")
# A tibble: 662 × 2
authorId name
<chr> <chr>
1 39765778 Adam D. Smith
2 2118081662 A. Smith
3 2109352620 Adam M. Smith
4 2109352729 Adam B. Smith
5 2276184838 Adam B. Smith
6 2109352675 Adam C. Smith
7 2158602926 Adam M. Smith
8 39872837 Adam Smith
9 2109352685 Adam C. Smith
10 2128824945 Adam N. H. Smith
# ℹ 652 more rows
So where is the rest of the data?
Semantic scholar only returns authorId and name by default.
But we also want papers.
The API handles this through the fields parameter and you can request additional fields
The given example is https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith&fields=name,aliases,url,papers.title,papers.year
We are only interested in some of the fields, so let’s build a new request and see what we get:
This structure is a lot more demanding since we have nested content (authors inside papers inside scholars).
wrangle the data
For most of the wrangling here, we can use the unnest_ functions from the tidyverse:
adam_search <-pluck(resp, "data") |># bind initial data into a tibblebind_rows() |># unnest papers list into columnsunnest_wider(papers) |># unnest authors into rowsunnest(authors) |># unnest the new authors into columnsunnest_wider(authors, names_sep ="_") |># fieldsOfStudy is a list within a list, so we call unnest twiceunnest(fieldsOfStudy, keep_empty =TRUE) |>unnest(fieldsOfStudy, keep_empty =TRUE)
We now get several useful columns including the field of study of a paper (which we could use to differentiate between different authors with the same name).
adam_search
# A tibble: 2,122 × 8
authorId name paperId title year fieldsOfStudy authors_authorId
<chr> <chr> <chr> <chr> <int> <chr> <chr>
1 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 15089134
2 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 39765778
3 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 7430051
4 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 4704115
5 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 3429443
6 39765778 Adam D. Smith 01670b5c78… The … 2023 <NA> 32546788
7 39765778 Adam D. Smith 0671806ef9… Fort… 2023 <NA> 49608903
8 39765778 Adam D. Smith 0671806ef9… Fort… 2023 <NA> 39765778
9 39765778 Adam D. Smith 44fc62276b… US A… 2023 <NA> 2221731705
10 39765778 Adam D. Smith 44fc62276b… US A… 2023 <NA> 17317553
# ℹ 2,112 more rows
# ℹ 1 more variable: authors_name <chr>
Let’s put it all together in an extended function
find_scholar <-function(name, fields ="name,papers.title,papers.title,papers.year,papers.fieldsOfStudy,papers.authors",limit =100) {# make initial request res <-request("https://api.semanticscholar.org/graph/v1/author/search") |>req_url_query(query = name) |>req_url_query(fields = fields,limit = limit) |>req_headers(accept ="application/json") |>req_perform() |>resp_body_json()# note total total <-pluck(res, "total")# display user messagemessage("Found ", total, " authors")# note offset nxt <-pluck(res, "next")# wrangle initial data data <-parse_response(res) page <-1# loop through pages until no new ones existwhile (!is.null(nxt)) { page <- page +1message("\t...fetching page ", page) res <-request("https://api.semanticscholar.org/graph/v1/author/search") |>req_url_query(query = name,offset = nxt,fields = fields,limit = limit) |>req_throttle(rate =30/60) |># make only 30 requests per minutereq_headers(accept ="application/json") |>req_perform() |>resp_body_json()# get next offset; will be NULL on the last page nxt <-pluck(res, "next") data_new <-pluck(res, "data") |>bind_rows() data <- data |>bind_rows(data_new) }return(data)}
Let’s put it all together in an extended function
I separated the parsing function from this to make it easier to read.
parse_response <-function(resp) {pluck(resp, "data") |># bind initial data into a tibblebind_rows() |># unnest papers list into columnsunnest_wider(papers) |># unnest authors into rowsunnest(authors) |># unnest the new authors into columnsunnest_wider(authors, names_sep ="_") |># fieldsOfStudy is a list within a list, so we call unnest twiceunnest(fieldsOfStudy, keep_empty =TRUE) |>unnest(fieldsOfStudy, keep_empty =TRUE)}
First, review the material. This example is pretty similar to the last one. But:
it uses a different documentation style called ReDoc, which does not give you curl calls to copy
it uses a different pagination: instead of using the total number of items, we look for new ones until nothing new is returned
we throttle the number of requests
Document the function we just created. This is mainly to let you think about the parameters and how you would describe their working to someone else
Use the function to search for a couple of scholars of your choice. Who has the most co-authors and unique papers?
Say you found an authors ID with the search function. How could you use “/author/{author_id}” and “/author/{author_id}/papers” to request more information about them?
Bonus:
Write a function that wraps “/author/{author_id}”
Reminder: Social Programme
DATE
Event
Time
Venue
MONDAY 7 July
Meet and Greet - in person
19:00 start
SU Bar
TUESDAY 8 July
Climbing
18:30 start
Sports Centre
WEDNESDAY 9 July
Harold Clarke Speaker Series - hybrid
18:45 - 20.00
EBS
THURSDAY 10 July
Sports Night
18:30 - 20:30
Sports Centre
FRIDAY 11 July
Wivenhoe Pub Run
18:30 start
Wivenhoe pubs
MONDAY 14 JULY
SU bar Quiz
19:00 start
SU Bar
TUESDAY 15 JULY
Sports Night
18:30 - 20:30
Sports Centre
WEDNESDAY 16 JULY
Harold Clarke Speaker Series - hybrid
18:30
EBS
THURSDAY 17 JULY
Farewell Party Karaoke
20:30 - 23:30
SU Bar
Wrap Up
Save some information about the session for reproducibility.